131 research outputs found

    PISCES: recent improvements to a PDB sequence culling server

    Get PDF
    PISCES is a database server for producing lists of sequences from the Protein Data Bank (PDB) using a number of entry- and chain-specific criteria and mutual sequence identity. Our goal in culling the PDB is to provide the longest list possible of the highest resolution structures that fulfill the sequence identity and structural quality cut-offs. The new PISCES server uses a combination of PSI-BLAST and structure-based alignments to determine sequence identities. Structure alignment produces more complete alignments and therefore more accurate sequence identities than PSI-BLAST. PISCES now allows a user to cull the PDB by-entry in addition to the standard culling by individual chains. In this scenario, a list will contain only entries that do not have a chain that has a sequence identity to any chain in any other entry in the list over the sequence identity cut-off. PISCES also provides fully annotated sequences including gene name and species. The server allows a user to cull an input list of entries or chains, so that other criteria, such as function, can be used. Results from a search on the re-engineered RCSB's site for the PDB can be entered into the PISCES server by a single click, combining the powerful searching abilities of the PDB with PISCES's utilities for sequence culling. The server's data are updated weekly. The server is available at

    Charge Asymmetry in the Proteins of the Outer Membrane

    Get PDF

    Accurate Structural Correlations from Maximum Likelihood Superpositions

    Get PDF
    The cores of globular proteins are densely packed, resulting in complicated networks of structural interactions. These interactions in turn give rise to dynamic structural correlations over a wide range of time scales. Accurate analysis of these complex correlations is crucial for understanding biomolecular mechanisms and for relating structure to function. Here we report a highly accurate technique for inferring the major modes of structural correlation in macromolecules using likelihood-based statistical analysis of sets of structures. This method is generally applicable to any ensemble of related molecules, including families of nuclear magnetic resonance (NMR) models, different crystal forms of a protein, and structural alignments of homologous proteins, as well as molecular dynamics trajectories. Dominant modes of structural correlation are determined using principal components analysis (PCA) of the maximum likelihood estimate of the correlation matrix. The correlations we identify are inherently independent of the statistical uncertainty and dynamic heterogeneity associated with the structural coordinates. We additionally present an easily interpretable method (“PCA plots”) for displaying these positional correlations by color-coding them onto a macromolecular structure. Maximum likelihood PCA of structural superpositions, and the structural PCA plots that illustrate the results, will facilitate the accurate determination of dynamic structural correlations analyzed in diverse fields of structural biology

    Rapid calcium-dependent activation of Aurora-A kinase

    Get PDF
    Oncogenic hyperactivation of the mitotic kinase Aurora-A (AurA) in cancer is associated with genomic instability. Increasing evidence indicates that AurA also regulates critical processes in normal interphase cells, but the source of such activity has been obscure. We report here that multiple stimuli causing release of Ca2+ from intracellular endoplasmic reticulum stores rapidly and transiently activate AurA, without requirement for second messengers. This activation is mediated by direct Ca2+-dependent calmodulin (CaM) binding to multiple motifs on AurA. On the basis of structure–function analysis and molecular modelling, we map two primary regions of CaM-AurA interaction to unfolded sequences in the AurA N- and C-termini. This unexpected mechanism for AurA activation provides a new context for evaluating the function of AurA and its inhibitors in normal and cancerous cells

    Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

    Get PDF
    Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp

    Candidate Variants in DNA Replication and Repair Genes in Early-Onset Renal Cell Carcinoma Patients Referred for Germline Testing

    Get PDF
    Background: Early-onset renal cell carcinoma (eoRCC) is typically associated with pathogenic germline variants (PGVs) in RCC familial syndrome genes. However, most eoRCC patients lack PGVs in familial RCC genes and their genetic risk remains undefined. Methods: Here, we analyzed biospecimens from 22 eoRCC patients that were seen at our institution for genetic counseling and tested negative for PGVs in RCC familial syndrome genes. Results: Analysis of whole-exome sequencing (WES) data found enrichment of candidate pathogenic germline variants in DNA repair and replication genes, including multiple DNA polymerases. Induction of DNA damage in peripheral blood monocytes (PBMCs) significantly elevated numbers of [Formula: see text]H2AX foci, a marker of double-stranded breaks, in PBMCs from eoRCC patients versus PBMCs from matched cancer-free controls. Knockdown of candidate variant genes in Caki RCC cells increased [Formula: see text]H2AX foci. Immortalized patient-derived B cell lines bearing the candidate variants in DNA polymerase genes (POLD1, POLH, POLE, POLK) had DNA replication defects compared to control cells. Renal tumors carrying these DNA polymerase variants were microsatellite stable but had a high mutational burden. Direct biochemical analysis of the variant Pol δ and Pol η polymerases revealed defective enzymatic activities. Conclusions: Together, these results suggest that constitutional defects in DNA repair underlie a subset of eoRCC cases. Screening patient lymphocytes to identify these defects may provide insight into mechanisms of carcinogenesis in a subset of genetically undefined eoRCCs. Evaluation of DNA repair defects may also provide insight into the cancer initiation mechanisms for subsets of eoRCCs and lay the foundation for targeting DNA repair vulnerabilities in eoRCC

    Lipid Exchange Mechanism of the Cholesteryl Ester Transfer Protein Clarified by Atomistic and Coarse-grained Simulations

    Get PDF
    Cholesteryl ester transfer protein (CETP) transports cholesteryl esters, triglycerides, and phospholipids between different lipoprotein fractions in blood plasma. The inhibition of CETP has been shown to be a sound strategy to prevent and treat the development of coronary heart disease. We employed molecular dynamics simulations to unravel the mechanisms associated with the CETP-mediated lipid exchange. To this end we used both atomistic and coarse-grained models whose results were consistent with each other. We found CETP to bind to the surface of high density lipoprotein (HDL) -like lipid droplets through its charged and tryptophan residues. Upon binding, CETP rapidly (in about 10 ns) induced the formation of a small hydrophobic patch to the phospholipid surface of the droplet, opening a route from the core of the lipid droplet to the binding pocket of CETP. This was followed by a conformational change of helix X of CETP to an open state, in which we found the accessibility of cholesteryl esters to the C-terminal tunnel opening of CETP to increase. Furthermore, in the absence of helix X, cholesteryl esters rapidly diffused into CETP through the C-terminal opening. The results provide compelling evidence that helix X acts as a lid which conducts lipid exchange by alternating the open and closed states. The findings have potential for the design of novel molecular agents to inhibit the activity of CETP

    Investigation of the causal etiology in a patient with T-B+NK+ immunodeficiency

    Get PDF
    Newborn screening for severe combined immunodeficiency (SCID) has not only accelerated diagnosis and improved treatment for affected infants, but also led to identification of novel genes required for human T cell development. A male proband had SCID newborn screening showing very low T cell receptor excision circles (TRECs), a biomarker for thymic output of nascent T cells. He had persistent profound T lymphopenia, but normal numbers of B and natural killer (NK) cells. Despite an allogeneic hematopoietic stem cell transplant from his brother, he failed to develop normal T cells. Targeted resequencing excluded known SCID genes; however, whole exome sequencing (WES) of the proband and parents revealed a maternally inherited X-linked missense mutation in MED14 (MED14V763A), a component of the mediator complex. Morpholino (MO)-mediated loss of MED14 function attenuated T cell development in zebrafish. Moreover, this arrest was rescued by ectopic expression of cDNA encoding the wild type human MED14 ortholog, but not by MED14V763A, suggesting that the variant impaired MED14 function. Modeling of the equivalent mutation in mouse (Med14V769A) did not disrupt T cell development at baseline. However, repopulation of peripheral T cells upon competitive bone marrow transplantation was compromised, consistent with the incomplete T cell reconstitution experienced by the proband upon transplantation with bone marrow from his healthy male sibling, who was found to have the same MED14V763A variant. Suspecting that the variable phenotypic expression between the siblings was influenced by further mutation(s), we sought to identify genetic variants present only in the affected proband. Indeed, WES revealed a mutation in the L1 cell adhesion molecule (L1CAMQ498H); however, introducing that mutation in vivo in mice did not disrupt T cell development. Consequently, immunodeficiency in the proband may depend upon additional, unidentified gene variants

    A Mathematical Framework for Protein Structure Comparison

    Get PDF
    Comparison of protein structures is important for revealing the evolutionary relationship among proteins, predicting protein functions and predicting protein structures. Many methods have been developed in the past to align two or multiple protein structures. Despite the importance of this problem, rigorous mathematical or statistical frameworks have seldom been pursued for general protein structure comparison. One notable issue in this field is that with many different distances used to measure the similarity between protein structures, none of them are proper distances when protein structures of different sequences are compared. Statistical approaches based on those non-proper distances or similarity scores as random variables are thus not mathematically rigorous. In this work, we develop a mathematical framework for protein structure comparison by treating protein structures as three-dimensional curves. Using an elastic Riemannian metric on spaces of curves, geodesic distance, a proper distance on spaces of curves, can be computed for any two protein structures. In this framework, protein structures can be treated as random variables on the shape manifold, and means and covariance can be computed for populations of protein structures. Furthermore, these moments can be used to build Gaussian-type probability distributions of protein structures for use in hypothesis testing. The covariance of a population of protein structures can reveal the population-specific variations and be helpful in improving structure classification. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions. We show that our method performs comparably with commonly used methods in protein structure classification on a large manually annotated data set

    Nucleotide Binding Switches the Information Flow in Ras GTPases

    Get PDF
    The Ras superfamily comprises many guanine nucleotide-binding proteins (G proteins) that are essential to intracellular signal transduction. The guanine nucleotide-dependent intrinsic flexibility patterns of five G proteins were investigated in atomic detail through Molecular Dynamics simulations of the GDP- and GTP-bound states (SGDP and SGTP, respectively). For all the considered systems, the intrinsic flexibility of SGDP was higher than that of SGTP, suggesting that Guanine Exchange Factor (GEF) recognition and nucleotide switch require higher amplitude motions than effector recognition or GTP hydrolysis. Functional mode, dynamic domain, and interaction energy correlation analyses highlighted significant differences in the dynamics of small G proteins and Gα proteins, especially in the inactive state. Indeed, SGDP of Gαt, is characterized by a more extensive energy coupling between nucleotide binding site and distal regions involved in GEF recognition compared to small G proteins, which attenuates in the active state. Moreover, mechanically distinct domains implicated in nucleotide switch could be detected in the presence of GDP but not in the presence of GTP. Finally, in small G proteins, functional modes are more detectable in the inactive state than in the active one and involve changes in solvent exposure of two highly conserved amino acids in switches I and II involved in GEF recognition. The average solvent exposure of these amino acids correlates in turn with the rate of GDP release, suggesting for them either direct or indirect roles in the process of nucleotide switch. Collectively, nucleotide binding changes the information flow through the conserved Ras-like domain, where GDP enhances the flexibility of mechanically distinct portions involved in nucleotide switch, and favors long distance allosteric communication (in Gα proteins), compared to GTP
    corecore